{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第四题：支持向量机的回归任务\n",
    "\n",
    "实验内容：\n",
    "1. 使用支持向量机完成kaggle房价预测问题\n",
    "2. 使用训练集训练模型，计算测试集的MAE和RMSE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 请你使用SVR，完成kaggle房价预测问题\n",
    "\n",
    "要求：使用'LotArea', 'BsmtUnfSF', 'GarageArea'作为特征，完成以下内容的填写\n",
    "\n",
    "###### 双击此处填写\n",
    "\n",
    "核函数 | C | MAE | RMSE\n",
    "- | - | - | - \n",
    "rbf | 0.1 | 56512.86 | 79837.51\n",
    "rbf | 1 | 56500.99 | 79823.69\n",
    "linear | 0.1 | 46125.38 | 77087.49\n",
    "linear | 1 | 60389.64 | 124715.14\n",
    "sigmoid | 0.1 | 56514.07 | 79839.06\n",
    "sigmoid | 1 | 56513.07 | 79839.26"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "# 读取数据\n",
    "data = pd.read_csv('data/kaggle_house_price_prediction/kaggle_hourse_price_train.csv')\n",
    "\n",
    "# 使用这3列作为特征\n",
    "features = ['LotArea', 'BsmtUnfSF', 'GarageArea']\n",
    "target = 'SalePrice'\n",
    "data = data[features + [target]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((1022, 3), (1022,), (438, 3), (438,))"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 数据集分割\n",
    "from sklearn.model_selection import train_test_split\n",
    "trainX, testX, trainY, testY = train_test_split(data[features], data[target], test_size = 0.3, random_state = 32)\n",
    "trainX.shape, trainY.shape, testX.shape, testY.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**注意：计算线性核的时候，要使用 LinearSVR 这个类，不要使用SVR(kernel = 'linear')。LinearSVR不需要设置kernel参数！**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 引入模型\n",
    "from sklearn.svm import SVR\n",
    "from sklearn.svm import LinearSVR\n",
    "from sklearn.metrics import mean_absolute_error\n",
    "from sklearn.metrics import mean_squared_error"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "rbf：C=0.1\n",
      "mae: 56512.86\n",
      "rmse 79837.51\n"
     ]
    }
   ],
   "source": [
    "# YOUR CODE HERE\n",
    "regr=SVR(kernel='rbf',C=0.1)\n",
    "regr.fit(trainX,trainY)\n",
    "prediction=regr.predict(testX)\n",
    "MAE=round(mean_absolute_error(testY,prediction),2)\n",
    "RMSE=round(mean_squared_error(testY,prediction)** 0.5,2)\n",
    "print('rbf：C=0.1')\n",
    "print('mae:',MAE)\n",
    "print('rmse',RMSE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "rbf：C=1\n",
      "mae: 56500.99\n",
      "rmse 79823.69\n"
     ]
    }
   ],
   "source": [
    "regr=SVR(kernel='rbf',C=1)\n",
    "regr.fit(trainX,trainY)\n",
    "prediction=regr.predict(testX)\n",
    "MAE=round(mean_absolute_error(testY,prediction),2)\n",
    "RMSE=round(mean_squared_error(testY,prediction)** 0.5,2)\n",
    "print('rbf：C=1')\n",
    "print('mae:',MAE)\n",
    "print('rmse',RMSE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "linear：C=0.1\n",
      "mae: 46125.38\n",
      "rmse 77087.49\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "D:\\anaconda3\\lib\\site-packages\\sklearn\\svm\\_base.py:976: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.\n",
      "  warnings.warn(\"Liblinear failed to converge, increase \"\n"
     ]
    }
   ],
   "source": [
    "regr=LinearSVR(C=0.1)\n",
    "regr.fit(trainX,trainY)\n",
    "prediction=regr.predict(testX)\n",
    "MAE=round(mean_absolute_error(testY,prediction),2)\n",
    "RMSE=round(mean_squared_error(testY,prediction)** 0.5,2)\n",
    "print('linear：C=0.1')\n",
    "print('mae:',MAE)\n",
    "print('rmse',RMSE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "linear：C=1\n",
      "mae: 52140.57\n",
      "rmse 70706.86\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "D:\\anaconda3\\lib\\site-packages\\sklearn\\svm\\_base.py:976: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.\n",
      "  warnings.warn(\"Liblinear failed to converge, increase \"\n"
     ]
    }
   ],
   "source": [
    "regr=LinearSVR(C=1)\n",
    "regr.fit(trainX,trainY)\n",
    "prediction=regr.predict(testX)\n",
    "MAE=round(mean_absolute_error(testY,prediction),2)\n",
    "RMSE=round(mean_squared_error(testY,prediction)** 0.5,2)\n",
    "print('linear：C=1')\n",
    "print('mae:',MAE)\n",
    "print('rmse',RMSE)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sigmoid：C=0.1\n",
      "mae: 56513.07\n",
      "rmse 79839.26\n"
     ]
    }
   ],
   "source": [
    "regr=SVR(kernel='sigmoid')\n",
    "regr.fit(trainX,trainY)\n",
    "prediction=regr.predict(testX)\n",
    "MAE=round(mean_absolute_error(testY,prediction),2)\n",
    "RMSE=round(mean_squared_error(testY,prediction)** 0.5,2)\n",
    "print('sigmoid：C=0.1')\n",
    "print('mae:',MAE)\n",
    "print('rmse',RMSE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sigmoid：C=1\n",
      "mae: 56513.07\n",
      "rmse 79839.26\n"
     ]
    }
   ],
   "source": [
    "regr=SVR(kernel='sigmoid')\n",
    "regr.fit(trainX,trainY)\n",
    "prediction=regr.predict(testX)\n",
    "MAE=round(mean_absolute_error(testY,prediction),2)\n",
    "RMSE=round(mean_squared_error(testY,prediction)** 0.5,2)\n",
    "print('sigmoid：C=1')\n",
    "print('mae:',MAE)\n",
    "print('rmse',RMSE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**由第一个表格可以看出，线性核函数中不同的C值对平均绝对误差MAE和均方根误差的影响较大。而rbf核函数及sigmoid核函数的两类误差受C值影响小**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
